Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
نویسندگان
چکیده
In this paper, we introduce a framework that identifies online plagiarism by exploiting lexical, syntactic and semantic features that includes duplication-gram, reordering and alignment of words, POS and phrase tags, and semantic similarity of sentences. We establish an ensemble framework to combine the predictions of each model. Results demonstrate that our system can not only find considerable amount of real-world online plagiarism cases but also outperforms several state-of-the-art algorithms and commercial software.
منابع مشابه
GermaNet - A Lexical-Semantic Net For German
We present the lexical-semantic net for German "GermaNet" which integrates conceptual ontological information with lexical semantics, within and across word classes. It is compatible with the Princeton WordNet but integrates principlebased modifications on the constructional and organizational level as well as on the level of lexical and conceptual relations. GermaNet includes a new treatment o...
متن کاملThe two be's of English
This qualitative study investigates the uses of be in Contemporary English. Based on this study, one easy claim and one more difficult claim are proposed. The easy claim is that the traditional distinction between be as a lexical verb and be as an auxiliary is faulty. In particular, 'copular-be', traditionally considered to be a lexical verb, is in fact a prototypi...
متن کاملReverse Engineering of Network Software Binary Codes for Identification of Syntax and Semantics of Protocol Messages
Reverse engineering of network applications especially from the security point of view is of high importance and interest. Many network applications use proprietary protocols which specifications are not publicly available. Reverse engineering of such applications could provide us with vital information to understand their embedded unknown protocols. This could facilitate many tasks including d...
متن کاملParaphrase Recognition using Neural Network Classification
Paraphrasing refers to conveying the same content in several ways. The successful recognition of paraphrases is crucial to various natural language processing tasks such as Information Extraction, Document Summarization, Question Answering etc. Several techniques have been employed for paraphrase recognition using lexical, syntactic and semantic features. Many of these systems have been tested ...
متن کاملBabelplagiarism: What can BabelNet do for Cross-language Plagiarism Detection?
In the first part of the talk, I will present BabelNet, a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia. In addition Machine Translation is also applied to enrich the knowledge resource with lexical information for all languages. We p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012